RTM at SemEval-2017 Task 1: Referential Translation Machines for Predicting Semantic Similarity

نویسنده

  • Ergun Biçici
چکیده

We use referential translation machines for predicting the semantic similarity of text in all STS tasks which contain Arabic, English, Spanish, and Turkish this year. RTMs pioneer a language independent approach to semantic similarity and remove the need to access any task or domain specific information or resource. RTMs become 6th out of 52 submissions in Spanish to English STS. We average prediction scores using weights based on the training performance to improve the overall performance. 1 Referential Translation Machines (RTMs) Semantic textual similarity (STS) task (Cer et al., 2017) at SemEval-2017 (Bethard et al., 2017) is about quantifying the degree of similarity between two given sentences S1 and S2 in the same language or in different languages. RTMs use interpretants, data close to the task instances, to derive features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and to identify translation acts between any two data sets for building prediction models. RTMs are applicable in different domains and tasks and in both monolingual and bilingual settings. Figure 1 depicts RTMs and explains the model building process. RTMs use ParFDA (Biçici, 2016a) for instance selection and machine translation performance prediction system (MTPPS) (Biçici and Way, 2015) for generating features for the training and the test set mapping both to the same space where the total number of features in each task becomes 368. The new features we include are about punctuation: number of tokens about puncFigure 1: RTM depiction: ParFDA selects interpretants close to the training and test data using parallel corpus in bilingual settings and monolingual corpus in the target language or just the monolingual target corpus in monolingual settings; an MTPPS use interpretants and training data to generate training features and another use interpretants and test data to generate test features in the same feature space; learning and prediction takes place taking these features as input. tuation (Kozlova et al., 2016) and the cosine between the punctuation vectors. RTMs are providing a language independent text processing and machine learning model able to use predictions from different predictors. We use ridge regression (RR), k-nearest neighors (KNN), support vector regression (SVR), AdaBoost (Freund and Schapire, 1997), and extremely randomized trees (TREE) (Geurts et al., 2006) as learning models in combination with feature selection (FS) (Guyon et al., 2002) and partial least squares (PLS) (Wold et al., 1984). For most of the models, we use scikit-learn. 1 We optimize the models using a subset of the training data for the following parameters: λ for RR, k for KNN, γ, C, and for SVR, minimum number of samples for leaf nodes and for splitting an inhttp://scikit-learn.org/. For RR, contains different solvers, support for sparse matrices, and checks for size and errors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RTM-DCU: Predicting Semantic Similarity with Referential Translation Machines

We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model effectively judging monolingual and bilingual similarity while identifying translation acts between any two data sets with respect to interpretants. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain spec...

متن کامل

RTM-DCU: Referential Translation Machines for Semantic Similarity

We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs judge the quality or the semantic similarity of text by using re...

متن کامل

Referential Translation Machines for Predicting Translation Quality and Related Statistics

We use referential translation machines (RTMs) for predicting translation performance. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. We improve our RTM models with the ParFDA instance selection model (Biçici et al., 2015), with additional features for predicting the translation performance,...

متن کامل

RTM at SemEval-2016 Task 1: Predicting Semantic Similarity with Referential Translation Machines and Related Statistics

We use referential translation machines (RTMs) for predicting the semantic similarity of text in both STS Core and Cross-lingual STS. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. RTMs become 14th out of 26 submissions in Cross-lingual STS. We also present rankings of various prediction tas...

متن کامل

Referential Translation Machines for Predicting Translation Performance

Referential translation machines (RTMs) pioneer a language independent approach for predicting translation performance and to all similarity tasks with top performance in both bilingual and monolingual settings and remove the need to access any task or domain specific information or resource. RTMs achieve to become 1st in documentlevel, 4th system at sentence-level according to mean absolute er...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017